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ABSTRACT 

An  algorithm,  especially  suitable  for  a computer,  is  presented  for 
carrying  out  Fisher's  two-sample  permutation  test  for  non-negative  integer- 
valued data.  It  is  shown  that  the  same  method  can  be  applied  to  carry  out 
the  permutation  Wilcoxon  test,  using  average  ranks.  Some  numerical  examples 
are  given  and  an  optimal  property  of  the  permutation  test  is  indicated. 
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AN  ALGORITHM  FOR  TMF:  DISCRIiTF  FISHF.R'S  PERMUTATION  TEST 


■k 

And  rew  P . Soms 


1.  INTRODUCTION 

Ihe  concept  of  a permutation  or  randomization  test  originated  with 
Fisher  [s].  An  excellent  discussion  is  given  by  Conover  [z],  pp.  357-64, 
together  with  extensive  references,  and  the  reader  is  referred  to  this  source 
for  the  relevant  details.  Briefly,  let  x^,  l^i^k^,  and  y^,  l^i^k^,  be  the 
observed  values  from  populations  1 and  2,  respectively  (not  necessarily 
distinct),  and  let  x and  y be  the  sample  means.  It  is  desired  to  test  at 
level  a the  hypothesis  that  populations  1 and  2 are  identical  against  the 
alternative  that  population  2 tends  to  produce  smaller  values  than  population  1 

(i.e.,  is  stochastically  smaller).  For  Fisher's  permutation  test,  hereafter 
*■'  called  the  permutation  test,  the  number  N of  samples  of  size  k2  that  can  be 

• 4 

‘i  drawn  from  the  combined  set  x^,  ljf_i<k^,  y^,  l^i^k2,  without  replacement, 

with  sample  mean  less  than  or  equal  to  y,  is  counted  and  the  null  hypothesis 
k 

2 

exactly  the  same  way,  except  that  the  sum  of  the  average  ranks  is  used  in 
place  of  y.  These  procedures  are  especially  appropriate  when  there  are  few 
distinct  values  taken  on  by  the  sample  observations  and  there  are  a substantial 
^ number  of  ties  in  the  data. 


£ a.  The  permutation  Wilcoxon  test  is  carried  out  in 


r'^1^ 

rejected  if  N/ 
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As  an  example,  in  drug  screening  experiments  a control  and  treatment 
group  are  observed  for  a fixed  length  of  time,  at  the  end  of  which  each  sub- 
ject is  assigned  an  integral  numerical  score  0,  1,  ...,  k,  with  0 being  "best" 
or  normal  and  k "worst,"  or  completely  diseased.  It  is  then  desired,  using 
an  exact  test,  to  compare  the  scores  of  the  control  group  against  those  of  the 
treatment,  to  see  if  the  treatment  scores  are  significantly  lower. 


2.  DERIVATION  OF  THE  TESTS 


Let  each  independent  control  score  have  the  same  density 

Pi,  P[Xj^  = i]  = Pj^,  0£i£m,  with  cumulative  distribution  function  F,  and  each 
independent  treatment  score  'i y l£j£k2,  the  same  density  p\  = i]  = p/ , 

O^i^m,  let  kj  and  S^  and  k^  and  S2  be  the  sample  sizes  and  sample  sums  for 
the  control  and  treatment,  respectively,  and  let  the  total  number  of  O's, 

I's,  ...,  k's  observed  be  m^,  ...,  m^^  (here  k is  the  largest  i£m  such  that 
m^  > 03 . It  is  desired  to  test  the  null  hypothesis  F = G against  the  alterna- 
tive F ^ G,  where  F ^ G means  F(i)  ^ G(i),  O^i^m,  with  at  least  one  strict 
inequality  (i.e.,  the  treatment  tends  to  produce  smaller  values,  or 
equivalently,  is  stochastically  smaller  than  X^).  Denote  by  the  number 
of  i's  in  the  treatment,  given  m^,  m^,  ...,  m^.  Then  it  follows  immediately 

that  under  the  null  hypothesis  Pj  = P(  t O^i^m, 

k ^ k /■  m.  ,rk,  +k. 


=n  , 0<i<k,  0<n  <m  , I n =kJ=  n V • 

'•  ^ ^ i=0  ■'  i=0'-  n.-*  k.. 


(1) 


The  tests  here  described  will  be  conditional,  given  m^,  ...,  m^^.  For  k=  1, 

(1)  is  the  well  known  conditional  hypergeometric  distribution  used  for  the 
test  of  the  null  hypothesis  p^  = p^  against  P^^P|  in  two  binomial  popula- 
tions with  parameters  (kj,p^)  and  (k^.p^)  (see,  e.g.,  Lehmann  [S],  pp.  140-5) 


The  permutation  test  here  described  may  be  regarded  as  an  extension  of 
Fisher's  exact  test  when  there  are  k+1  possible  outcomes,  0,  1,  k fk^2), 

with  i > j implying  that  i is  "worse"  than  j.  We  used  the  algorithm  to  be 
described  below  in  this  particular  case  and  found  agreement  with  the  tables 
ill  Bennett  et  al  [l].  The  distribution  in  (1)  is  called  the  multivariate 
hypergeometric,  and  has  been  discussed  by,  e.g..  Van  Eeden  [6]  and  Johnson 
and  Kotz  [4],  pp.  300-2. 

Let  the  significance  level  be  a.  Then  for  the  permutation  test  we  find 

k 

all  those  vectors  n = (n^,  ...,  nj^)  with  ^ in^£S2,  and  if  the  sum  of  the 

i=l 

probabilities  of  these  vectors  n is  less  than  or  equal  to  a,  we  reject  the 

null  hypothesis.  The  procedure  for  the  permutation  Wilcoxon  test  is  the  S2ime, 

except  that  the  sum  of  the  average  ranks  is  used.  Since  this  test  is 

invariant  under  any  transformation  of  the  sample  values  that  preserves  order, 

it  will  always  be  assumed  here  that  m^  > 0,  0£i£k  (if  this  is  not  so,  the 

data  can  be  relabelled  so  that  this  is  true).  The  average  rank  corresponding 
i-1 

to  the  value  i is  ^ m^  + (m^+l)/2  if  i^l  and  (mQ+l)/2  if  i = 0. 

For  the  purpose  of  deriving  a computational  procedure  for  the  descrip- 
tive level  of  significance,  it  is  convenient  to  consider  a class  of  permutation 
tests  of  which  the  two  described  above  are  special  cases.  It  is  assumed  that 
for  each  i,  0£i£k,  there  is  a non-negative  weight  c^,  < Cj^  < ...  and 

that  the  observed  value  for  the  treatment  is  $2  (i.e.,  if  n^^Q,  ijf.i£k2,  are  the 

k 

observed  treatment  frequencies,  then  $2  = ^ ^i"i0^ * problem  then  is  to 

i=0  ^ ^ k k 

find  all  vectors  (n^,  ...,  nj^)  such  that  O^n^^^m^,  J n.  =k2>  and  ^ c^^n^  £ S2 

k i=o  ^ i=0 

and  sum  their  probabilities.  Since  % = ^^2  "i’  ® possible  set  np,  . . . , nj^ 

must  satisfy 


r ^ 


k k 

■ I I 1 * 

^ i=l  i»l  ^ ^ 

k 

° - ‘‘2  • . ^ "i  - ”0  ' 
i»l 


(2) 


The  relations  (2)  are  equivalent  to 


"l  i <®2  - 'o'‘2  - 

k k 

‘2-*0-  J "ii"li'‘2-  J "l  > 


i=2 


® 1.  "l  £ “l  > 


i=2 


2£i£k, 


or  equivalently. 


(3) 


k r k k 

Max(0,k2-mQ-  ^ nj^)  <nj£Min  (S2-Co^2“.^  ’ ’^2'.^  "i’“l 


i»2 


i=2 


i=2 


. (4) 


0<n.  <m.  ,2<i<k. 
— 1 — 1 — — 


In  order  to  continue  the  process  of  obtaining  limits  for  n2»  n^,  ...,  n^^,  in 
terms  of  higher  subscripted  n^'s,  the  following  fact  is  needed.  For  any  real 


nuiri)ers  a.,  l<i<n,  b.,  1 < i < m, 

i'  — — 1 — — 


Max  a^  £ Min  b^ 
l<i<n  l<i<m 


(5) 


( 

• t 


if  and  only  if 


•i  iKj 


for  all  i and  j.  This  allows  us  to  obtain  intervals  for  n2»  n^,  nj^.  We 


-4- 


J 


F 


give  the  results  for  and  explicitly  and  also  give  a general  formula  for 
n^ . The  interval  for  n^  is 

k k 

MaxCO.k^-m^-mp-  J n.)  < n^  < MinCCS^-c^k^-  I (c. -0^)0. )/ (c^-c^)  , (6) 

i=3  i=3 

k k 

(S2-Cik2+(Ci-Co)mo-  I (c. -c^n. )/ (c2-Cj) . k2*  I n . , m2)  . 

i=3  i=3 

The  interval  for  n^  is 

k k 

Max(0,k2-m2-m^-mQ-  f n.)  <n3<Min((S2-Cpk2-  I (c. -CQ)n. )/ (c^-Cp)  . (7) 


CS2-Cik2>(Ci-Co)mo-  I (Ci-Ci)ni)/(C3-Ci)  . 

1=4 

k k 

^ ^^i~^2^^i^^  ^*"3~^2^  ’ ^2~  ^ *^i  ’ *"3^  ’ 
i=4  i=4 

The  general  formula  for  the  limits  on  n^  is  given  below.  The  lower  limit  is 
given  by 

j-1  k 

Max(0,  k-  - m - ^ n.)  < n.  . ({ 

i=0  i=j+l  ^ J 

The  upper  limit  is  the  minimum  of  the  j+2  terms 


k-  - y n.  , 
2 . r , 1 

i=j+l 


r-1  k 

^^2'‘=r’'2*  ^ ^ (c-c  )n.)/(c.-c J,  l<r<j-l  . 

1=0  i=j  + l All  j r 


-5- 


Note  that  (4),  (6),  and  (7)  are  special  cases  of  (8)  and  (9).  For  a given 

k 

value  of  k,  in  (8)  and  (91  'I  is  set  to  0 to  obtain  limits  for  n,  . The 

i=k+l 

general  proof  is  by  induction  and  follows  by  observing  that  in  comparing  the 

j-1  k 

last  j-1  terms  in  (9)  to  k - m.  - ^ n.  to  obtain  the  minimum  expres- 

i=0  ^ i=j+l  ^ 

sions  for  only  the  term  corresponding  to  r=j-l  needs  to  be  retained, 

since  the  j-1  expressions  are  non-increasing  functions  of  r. 


For  the  permutation  test,  c.  =i,  and  for  the  permutation  Wilcoxon  test, 
i-1  ^ 

Cq  = (mQ+l)/2  and  c^=  m.+(m^  + l)/2  for  i^l. 

^ j=0  ^ ^ 


3,  EXAMPLES 

The  advantage  in  obtaining  the  solution  in  the  above  form  is  that  it  is 
ideally  suited  for  a nested  do- loop,  with  nj^  being  the  index  for  the  outermost 
Uj  for  the  innermost  loop,  the  probabilities  being  summed  in  the  innermost 
loop.  We  have  written  a program  for  k=9  that  carries  out  the  permutation  and 
the  permutation  Wilcoxon  test,  and  the  two  examples  (based  on  laboratory  data) 
given  below  were  analyzed  using  this  program.  Note  that  if  a program  has  been 
written  for  k = kp,  it  may  be  used  for  all  simply  by  setting 

"'k+1’  ■■■’  ""k 
Example  1 

In  a screening  experiment  for  a drug,  k = 4,  mQ  = 7,  mj  = 17,  m2  = 19,  '"5  = 4 

m^  = l,  kj^  = 25,  k2  = 23,  and  82  = 21.  The  actual  control  results  were  n^=(0,  6, 

th 

14,  4,  1)  and  the  treatment  n^  = (7,  11,  5,  0,  0),  where  the  i entry  is  the 
number  of  times  i-1  occurs.  Note  that  if  all  combinations  are  examined,  there 
are  28,800  cases  if  no  bounding  is  done;  whereas  if  the  limits  (8)  and  (9)  are 
used,  there  are  only  34  cases  to  consider.  The  probability  of  the  set  which 
gives  a treatment  mean,  or  equivalently,  treatment  sum,  equal  to  or  smaller 
than  the  observed  is  .8794  x 10  ^ and  the  corresponding  probability  for  the 
sum  of  the  average  ranks  is  .9120  x lO"^,  and  thus  it  is  concluded  that  the 


evidence  is  very  strong  that  there  is  a causal  mechanism  depressing  the  scores 
of  the  treatment  group. 


E.xample  2 

In  a screening  experiment  for  a drug,  a control  and  two  treatments  were 

used,  the  control  results  being  n^  = (0,  0,  10,  13,  2),  and  the  treatments 

n^j^  = (1,  5,  11,  6,  0)  and  n^2  “ 23,  1,  0).  Note  that  the  mean  score 

for  treatment  1 is  1.96  and  for  treatment  2,  2.00,  the  control  mean  being  2.68. 

However,  for  the  permutation  test  the  probability  associated  with  treatment  1 

is  .1141  X 10  while  for  treatment  2 it  is  .1067  x 10  and  for  the  permu- 

-2 

tation  Wilcoxon  test  the  corresponding  probabilities  are  .1344  x 10  and 

4 

-4 

.1067  x 10  . Thus  the  descriptive  levels  can  go  in  the  opposite  direction  to 

the  mean  scores  or  sum  of  average  ranks. 

4.  CONCLUDING  REMARKS 

The  algorithm  for  the  permutation  and  the  permutation  Wilcoxon  tests 
here  described  is  most  efficient  when  there  are  a small  number  of  values  taken 

h 

on  by  the  data.  This  is  also,  of  course,  the  situation  where  the  effect  of 
ties  is  of  the  greatest  concern.  For  the  permutation  Wilcoxon  test  it  was 
shown  above  that  it  could  be  assumed  that  the  data  has  values  0,  1,  ...,  k 
with  m^>0,  0£i^k.  For  the  permutation  test,  by  a change  in  location  and 

•a 

•>  scale  (the  test  is  invariant  with  respect  to  these  transformations),  the  data 

can  be  transformed  so  that  the  smallest  value  is  0 and  all  the  values  integral 

J (possibly  with  gaps).  Then  the  above  procedure  is  applicable  if  m^  is  now 

defined  to  be  the  number  of  sample  values  that  equal  the  i+1^^  largest  data 

s t 

^ value,  with  a total  of  k+1  possible,  and  c^  is  the  i+1  largest  data  value. 

• t 

Thus  there  is  no  loss  of  generality  in  restricting  the  sample  values  to  be 
,,  0,  1,  ...,  k.  We  have  written  a short  Fortran  computer  program,  a listing  of 

which  is  available  on  request,  which  carries  out  the  permutation  and  the 

1 

-7- 


I 

I 


permutation  Wilcoxon  tests  for  at  most  10  distinct  data  values.  If  .1, 

then  in  order  to  have  a reasonable  running  time,  the  approximate  restrictions  on 
the  total  sample  size  are:  for  10  distinct  values,  ^ 50;  for  8, 

^ 80;  for  6,  _<  150;  and  for  5,  ^ 250.  Thus  if  there  are  5 or  fewer  distinct 

values  taken  on  by  the  data,  then  the  algorithm  here  described  can  be  used  for 
all  sample  sizes  encountered  in  practice.  If  the  data  has  more  than  10  distinct 
values,  then  extreme  tail  probabilities  may  still  be  calculated  using  the  above 
procedure. 

The  randomized  versions  of  the  two  tests  here  discussed  are  unbiased 
against  the  alternatives  F £ G.  This  follows  from  Lemma  1,  p.  73,  of  [5].  In 
addition,  by  using  a similar  argument  to  that  of  [5],  pp.  185-8,  for  the 
continuous  case,  it  follows  that  the  permutation  test  is  uniformly  most  power- 
ful in  the  class  of  all  unbiased  tests  of  F = G against  F _<  G for  the 
alternatives  p^  = c(0Q)e  h(i),  p^  = c(0^)e  h(i),  0^^  < 0^,  0 _<  i £ m. 

In  sections  1 and  2 the  permutation  test  and  the  permutation  Wilcoxon 
test  were  developed  based  on  population  models.  They  can  also  be  interpreted 
as  randomization  tests  on  the  k^^  + k2  experimental  units  provided  that 
each  combination  of  experimental  units  has  the  same  probability  of  being 
included  in  the  control  group.  The  latter  case  is,  in  fact,  the  most  reason- 
able assumption  for  the  examples  discussed. 

Many  other  types  of  biological  experiments  also  result  in  data  which 
can  be  analyzed  by  the  method  here  proposed  when  it  is  desired  to  compare  a 
treatment  to  a control  and  the  data  has  many  ties.  Some  examples  are  the 


number  of  survival  days  and  the  number  of  malformed  fetuses  in  a litter. 
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